IRIX Development Foundation for IRIX 6.4

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Development Foundation for IRIX 6.4 / SGI IRIX 6.4 Development Foundation.iso / docs6.4 / relnotes / patchSG0002211 / ch1.z / ch1

Wrap

Text File | 1997-09-11 | 48.6 KB | 1,123 lines

- 1 - 1. _P_a_t_c_h__S_G_0_0_0_2_2_1_1__R_e_l_e_a_s_e__N_o_t_e This release note describes patch SG0002211 to IRIX 6.4. Patch SG0002211 replaces patches(es) : SG0001815, SG0001856, SG0001954, SG0001978, SG0002056, SG0002117, and SG0002121 1.1 _S_u_p_p_o_r_t_e_d__H_a_r_d_w_a_r_e__P_l_a_t_f_o_r_m_s This patch contains bug fixes for IP27 and IP30 Platforms. The software cannot be installed on other configurations. 1.2 _S_u_p_p_o_r_t_e_d__S_o_f_t_w_a_r_e__P_l_a_t_f_o_r_m_s This patch contains bug fixes for IRIX 6.4 (version 1263561140) The software cannot be installed on other configurations. 1.3 _B_u_g_s__F_i_x_e_d__b_y__P_a_t_c_h__S_G_0_0_0_2_2_1_1 This patch contains fixes for the following bugs in IRIX 6.4. Bug numbers from Silicon Graphics bug tracking system are included for reference. For bugs fixed in prior patches, fix descriptions are grouped under the replaced patches. +o Bug #459567 : Memory mapping of files with DMAPI managed regions did not trigger the correct DMAPI events for xfs file systems mounted "-o dmi". +o Bug #484792 : mmap errors for file offsets > 2 GByte. +o Bug #494445 : prctl(PR_SETEXITSIG, signal) doesn't provide the semantics needed by most multi-threaded applications. The semantics of PR_SETEXITSIG were defined at a time when parallel Fortran codes were the order of the day. In that world, if any thread exited the application for any reason whatsoever, the application needed to terminate. With multi-threaded applications there is still the desire to terminate the application if any of the threads terminate abnormally, but calls to exit() and exec() by a thread shouldn't cause application termination. This patch adds a new prctl(PR_SETABORTSIG, signal) which does exactly that. If any thread aborts due to a signal, the share group will be sent the specified signal. On the other hand, if a thread exits the share group via a call to exit() or exec() the signal will not be sent. PR_SETEXITSIG and PR_SETABORTSIG are mutually exclusive; setting either one will nullify any previous setting of the - 2 - other. As with PR_SETEXITSIG, doing a prctl(PR_SETABORTSIG, 0) disables the abort signal processing. +o Bug #458133 : LLLLaaaarrrrggggeeee ppppaaaaggggeeee ttttuuuunnnneeeeaaaabbbblllleeeessss sssshhhhoooouuuulllldddd nnnnooootttt hhhhaaaavvvveeee aaaannnnyyyy lllliiiimmmmiiiittttssss.... The large page tuneables (nlpages_*) used for reserving large pages at boot time had a limit of 64. This limit does not make sense and hampers kernel configurations for databases. The limits should be enforced based on the total memory in the system. The bug fix removes the max limits. +o Bug #473859 : TTTTuuuunnnneeeeaaaabbbblllleeee ttttoooo ttttuuuurrrrnnnn ooooffffffff mmmmmmmmaaaapppp ppppeeeerrrrffffoooorrrrmmmmaaaannnncccceeee ooooppppttttiiiimmmmiiiizzzzaaaattttiiiioooonnnn ffffoooorrrr wwwwoooorrrrkkkkssssttttaaaattttiiiioooonnnnssss.... TTTThhhhiiiissss bbbbuuuugggg aaaaddddddddssss aaaa ttttuuuunnnneeeeaaaabbbblllleeee eeeennnnaaaabbbblllleeee____ddddeeeevvvvzzzzeeeerrrroooo____oooopppptttt ttttoooo ttttuuuurrrrnnnn ooooffffffff tttthhhheeee rrrreeeeggggiiiioooonnnn ccccooooaaaalllleeeesssscccciiiinnnngggg ooooppppttttiiiimmmmiiiizzzzaaaattttiiiioooonnnn ((((aaaaddddjjjjaaaacccceeeennnntttt rrrreeeeggggiiiioooonnnnssss aaaarrrreeee ccccooooaaaalllleeeesssscccceeeedddd iiiiffff tttthhhheeeeyyyy mmmmaaaapppp tttthhhheeee ssssaaaammmmeeee ffffiiiilllleeee ((((////ddddeeeevvvv////zzzzeeeerrrroooo)))) aaaannnndddd hhhhaaaavvvveeee tttthhhheeee ssssaaaammmmeeee aaaattttttttrrrriiiibbbbuuuutttteeeessss)))).... TTTThhhheeee ooooppppttttiiiimmmmiiiizzzzaaaattttiiiioooonnnn iiiissss vvvveeeerrrryyyy uuuusssseeeeffffuuuullll ffffoooorrrr XXXX sssseeeerrrrvvvveeeerrrrssss ((((aaaavvvvooooiiiiddddssss sssseeeeaaaarrrrcccchhhh ttttiiiimmmmeeee aaaaccccrrrroooossssssss lllloooottttssss ooooffff rrrreeeeggggiiiioooonnnnssss)))) oooonnnn wwwwoooorrrrkkkkssssttttaaaattttiiiioooonnnnssss bbbbuuuutttt aaaarrrreeee nnnnooootttt vvvveeeerrrryyyy uuuusssseeeeffffuuuullll ffffoooorrrr llllaaaarrrrggggeeee ccccoooommmmppppuuuutttteeee iiiinnnntttteeeennnnssssiiiivvvveeee mmmmaaaacccchhhhiiiinnnneeeessss.... TTTTuuuurrrrnnnniiiinnnngggg ooooffffffff tttthhhheeee ooooppppttttiiiimmmmiiiizzzzaaaattttiiiioooonnnn eeeennnnaaaabbbblllleeeessss pppprrrrooooggggrrrraaaammmmmmmmeeeerrrrssss ttttoooo ccccrrrreeeeaaaatttteeee mmmmuuuullllttttiiiipppplllleeee rrrreeeeggggiiiioooonnnnssss aaaaddddjjjjaaaacccceeeennnntttt ttttoooo eeeeaaaacccchhhh ooootttthhhheeeerrrr ((((aaaaddddddddrrrreeeessssssss ssssppppaaaacccceeee wwwwiiiisssseeee)))) ttttoooo aaaavvvvooooiiiidddd tttthhhheeee rrrreeeeggggiiiioooonnnn lllloooocccckkkk bbbboooottttttttlllleeeennnneeeecccckkkk.... ++++oooo BBBBuuuugggg ####555500002222999999996666 :::: ppppaaaaggggeeee____ddddiiiissssccccaaaarrrrdddd nnnneeeeeeeeddddssss ttttoooo ssssuuuuppppppppoooorrrrtttt SSSSBBBBEEEE ppppaaaaggggeeee ddddiiiissssccccaaaarrrrddddiiiinnnngggg.... In case of SBE memory errors, we would like to not reuse the page after it is freed up by the using processes but allow the current users to access the page while they have a reference to it. This is now supported. +o Rfe #502809 : NNNNeeeeeeeedddd nnnneeeewwww iiiinnnntttteeeerrrrffffaaaacccceeeessss ffffoooorrrr UUUUnnnniiiiCCCCeeeennnntttteeeerrrr CCCCAAAA This patch has some interfaces that are needed for CA- UniCenter. +o Bug #503126: Turned off promlogging to remote nodes on NI errors. +o Bug #504923 : Fix so diskless clients can boot (bug introduced in patch 1978). +o Bug #505685: BTE errors should dump hardware error state. - 3 - This was fixed by doing a dump of the hardware error state before panicking on the bte crb error. Also the panic message has been expanded to include relevant CRB information. +o Bug #506220 : idbg error on "vfs" command for DMAPI file system (e.g., file system mounted "-o dmi"). +o Bug #706050 : CPU 48: KERNEL FAULT SOFTWARE DETECTED SEGV This was a problem where sigtosharegroup didn't have any locking against exiting sproc processes - thus an exiting process could call detachshaddr, setting p_shaddr to null, while the caller of sigtosharegroup was trying to use the p_shaddr field. +o Bug (unreported) : Optimal assignment of I/O boards to nodes was incorrect It was previously possible for the assignment of a node to control a given I/O board to be different from the documented assignment, due to an off-by-one error. This patch includes a fix that makes the assignment conform to documented assignments. +o Bug #453414: SysV semaphores - sempid wrong for pthreads The sempid field was incorrectly using the sproc PID instead of the shared process PID. For pthread apps this meant that sempid might not match getpid() even though only threads from the same process accessed the semaphore. +o Bug #501616: fo_scsi_lun_remove was not in the failover stubs module, requiring the inclusion of failover.o in diskless kernels. +o Bug #501507: Race condition in mon_trace_switch Fix a Race condition in mon_trace_switch(). Kernel cannot depend on the value of a variable read before grabbing the lock. The variable needs to be read again after grabbing the lock, and before derefencing it as a pointer. +o Bug #507073: MD Directory error register reporting is wrong - 4 - Hub Memory interface error register bit field decoding was incorrect. Error dumping code was not decoding one of the field. Bugs fixed in Patch SG0002121: +o Bug #427056: vnode pcache race +o Bug #489537: gang scheduler hang +o Bug #491852: gang scheduler problem in patch 1978 +o Bug #449470: prreaddir returns bad data if multiple pids go to same slot This lead to the possibility that ps or ls /proc may list incorrect data. If one was very unlucky, the bug could lead to stack corruption within the kernel, with the possibility of a resulting crash. This bug was never observed in the field, but was discovered by code inspection. +o Bug #483959: improved mlockall() handling with MCL_FUTURE flag Prevent a gfx application from mistakenly getting a SIGSEGV when using mlockall(3C) with the MCL_FUTURE flag. +o Bug #481501 : AW- reboot on the Octane & Onyx2 running MRed code +o Bug #486400 : ISV app crashes system +o Bug #486264 : kernel panic when runing frame4 Each of these bugs resulted in a machine ASSERT failure with the following message: assertion failed cpu 0: (rp->r_refcnt > 1) || !(flags & RF_EXITING), file: ../os/region.c, line: 1006 This was caused by a bug in close-on-exec processing for sproc processes, and was in fact the same bug fix as for bug # 484611. Bug 484611 was fixed in kernel rollup patch 1978. +o Bug #491891: io_spunlock() needs to be improved for IP27 io_spunlock() needs to make sure that the PIO operations launched by the processor holding the lock go in order before the lock is released. This fix forces a sync operation to force all PIO operations to - 5 - reach a hardware domain where PIOs are always in order. +o Bug #491895: Hub 2.1 workaround This is a workaround to reduce or eliminate cache interventions which helps to avoid hitting one of the problems in Hub 2.1 +o Bug #494592: Better error message Error messages on a bus error were made more user friendly by including the module/slot information. +o Bug #497013: Cached read directory error Error message on a cached read directory error made more user friendly by including the module/slot information. +o Bug #497729: Disabling CPUs produces alarming message at boot Warning messages during volunteer-for-widget phase of xbow io initialization have been masked for headless nodes. +o Bug #500585: Wrong register is being read in router error state retrieval RR_PORT_PARMS and RR_STATUS_ERROR registers were being swapped while printing the router error state and this has been fixed. +o Bug #705897: ORIGIN PROGRAM FAILS WITH F77 7.2 USING -O3 This was a bug in the floating point emulation code in the kernel. If a floating point exception is taken on an instruction in a branch delay slot, the kernel must emulate the branch in order to compute the proper program counter for the faulting program. The emulation code for the MIPS4 bc1t/bc1f family of instructions was incorrect, thus resulting in an incorrect program counter when the user program was restarted after the exception. Bugs fixed in Patch SG0001978: +o Bug #432166 : panic due to tlbmiss in trilevel_pte() - 6 - +o Bug #433662: PPPPrrrroooocccceeeesssssssseeeessss CCCCaaaannnn HHHHaaaannnngggg oooonnnn IIIIssssoooollllaaaatttteeeedddd////RRRReeeessssttttrrrriiiicccctttteeeedddd PPPPrrrroooocccceeeessssssssoooorrrr.... When a processor is isolated or restricted, usually as part of run real-time application, other processes which are not mustrun onto the isolated/restricted processor can be hung. This bug has been particularly observed while running Mediabase applications. +o Bug #458212 : ipcs doesn't report outstanding shared memory +o Bug #462005 : The attr_multi system call produced errors if the count of operations was greater than 1. +o Bug #463762 : Device interrupt allocation couldnot be done inspite of interrupt bits being available. This fixes a bug in the interrupt target selection process on a particular hub where only one cpu is enabled. Also the interrupt target selection algorithm is made more generic. +o Bug #464148 : In order to support extremely large I/O configurations, the number of hwgraph vertexes that the kernel can handle is now controlled through a static tunable in stune/kernel, "hwgraph_num_dev". The default value should be sufficient for the vast majority of installations. +o Bug #466601 : ssssbbbbrrrrkkkk ssssyyyysssstttteeeemmmm ccccaaaallllllll sssshhhhoooouuuulllldddd iiiinnnnccccrrrreeeeaaaasssseeee rrrreeeeggggiiiioooonnnn ssssiiiizzzzeeee bbbbaaaasssseeeedddd oooonnnn ppppaaaaggggeeee ssssiiiizzzzeeee This is a performance enhancement. It allows programs that do a lot of small mallocs (like C++ programs) to use large pages effectively. +o Bug #468034 : This patch allows independent processes to share the kernel data structures that describe their address space. These data structures are called Page Tables and contain information about the virtual to physical address translation. A big benefit of sharing Page Table is speed. In fact any new process attaching to the SHM segment benefits from the page faulting activity performed by other attached processes. This dramatically reduces the number of page faults and makes a great difference in the overall performance. - 7 - This patch is highly recommended for installation running large Oracle Data Bases. Processes that want to make use of this feature should specify a special flags when calling _s_h_m_a_t. This option is only available if both the attaching address and the size of the SHM segment satisfy appropriate restrictions. See _s_h_m_a_t(2) for detailed information. +o BUG #468287 : The kernel routine which allocated user virtual address space was very inefficient when there were a large number of mappings. +o Bug #468904 : WWWWeeeeiiiigggghhhhttttlllleeeessssssss pppprrrroooocccceeeesssssssseeeessss sssslllloooowwww ssssyyyysssstttteeeemmmm rrrreeeessssppppoooonnnnsssseeee iiiinnnn mmmmuuuullllttttiiiipppprrrroooocccceeeessssssssoooorrrr mmmmaaaacccchhhhiiiinnnneeeessss Weightless processes compete effectively with normal timesharing processes, causing erratic interactive behavior. This patch searches more extensively for time-sharing threads before running weightless threads. +o Bug #469295 : Kmem_zone_alloc() should take a policy parameter Zone allocator now accepts a parameter to indicate the radius of the search to get the memory for a zone request. This is useful to avoid zone size bloats when lots of processes are started and killed. +o Bug #472156 : par can hang system This moves the fawltysched() call down after the kthread is unlocked. Calling fawltysched() while the kthread is locked can lead to deadlocks +o Bug #473350 : Can't do copy-on-write from read-only vnode region +o Bug #473757 : ipcs does not report outstanding shared memory +o Bug #473776 : Large pages can cause crashes due to inconsistent PTEs The PM policy synch code did not check to make sure that the pte bits are consistent for all the base pages of the large page. In this case some of the ptes had the mod bit set. This caused a large page to be formed with some ptes having the mod bit set and some not having the bit set. - 8 - +o Bug #474576 : ipcs on 6.4 broken - duplicate of bug #473757. +o Bug #474898 : NLM cancel requests were not always properly honored. +o Bug #475414 : PIO errors during probing should not be reported +o Bug #475765 : DFS support needed to be added to the kernel. +o Bug #475913 : Coalesced performance improvements +o Bug #476706 : Panic messages need to be logged in the flashlog for IP27 systems. +o Bug #477990 : Fixed the chunk cache to free up clean memory more proactively instead of waiting until freemem gets really low. +o Bug #478654 : PPPPoooowwwweeeerrrr ffffaaaaiiiilllluuuurrrreeee ddddaaaattttaaaa ccccoooorrrrrrrruuuuppppttttiiiioooonnnn ((((nnnnooootttteeee:::: ppppoooossssssssiiiibbbblllleeee rrrreeeeaaaallll---- ttttiiiimmmmeeee iiiimmmmppppaaaacccctttt)))) Abrupt loss of AC power to an Origin or Onyx2 system during I/O operations may cause a small amount of corrupt data to be transmitted or committed to disk, which can be a fatal problem esp. in database applications. This workaround prevents this by immediately halting all I/O when the system controller (MSC) power failure early warning is detected. Impact on real-time system performance is possible with old MSCs; affected users may eliminate this possibility by changing the systune variable ignore_sysctlr_intr to 1 or replacing the older MSC. +o Bug #480640 ssssiiiiggggwwwwaaaaiiiitttt wwwwoooouuuulllldddd nnnnooootttt wwwwoooorrrrkkkk pppprrrrooooppppeeeerrrrllllyyyy wwwwiiiitttthhhh pppptttthhhhrrrreeeeaaaaddddssss pppprrrrooooggggrrrraaaammmmssss.... A pthread that blocked a particular signal then attempted to wait for the signal via sigwait(3) or sigtimedwait(3) would not be notified of the signal's delivery. +o Bug #481414 : Reverse maps need to grow in smaller steps - 9 - The reverse map needed to grow in much smaller steps than it was. It was taking up too much memory in large memory machines if more than 15 processes share the memory. With much smaller steps the memory use came down from 1.4G to 295M. +o Bug #483044 : cache error type=interface messages are confusing In the case of "Type=Interface", we should not print the word "cache" at all. Instead, the message should say "System Interface Error" or "Memory Error". +o Bug #483048 : Fixed a bug where error_dump is not getting called on certain kinds "Kernel Data Bus Error" panics. +o Bug #483683 : TTTTiiiimmmmeeee----sssslllliiiicccceeee eeeennnndddd nnnnooootttt rrrreeeessssppppeeeecccctttteeeedddd oooonnnn CCCCCCCC----NNNNUUUUMMMMAAAA ssssyyyysssstttteeeemmmmssss.... Memory affinity code on CC-NUMA systems overrides time-slice end, allowing processes to run for extended periods without rescheduling. +o Bug #483978 : OOOOrrrriiiiggggiiiinnnn////OOOOnnnnyyyyxxxx2222 vvvvmmmmeeee ssssuuuuppppppppoooorrrrtttt Fix edtinit path to allow Origin2000 VME devices to be probed and device driver loaded. +o Bug #484353 : Made sure global_buf_table points to initialized memory to avoid kernel panics while recycling a buffer. +o Bug #484611 : close-on-exec not handled properly for sproc processes A fix is included to properly close file descriptors marked as close-on-exec. Previously, they were not properly closed for sproc processes that exec'ed. Detected as several sites that tried to run gaussian. +o Bug #484659 : Race condition in trilevel_pte There was a race condition in trivel_pte which caused the segtable to freed twice. +o Bug #484690 : Added single-bit ECC error monitoring features. This allows the detection of stuck data lines that may - 10 - otherwise go unnoticed because they are transparently corrected as single-bit errors. +o Bug #484698 : debug() should check to make sure kdebug is set before trapping We should not attempt to call the debugger if it isn't loaded. +o Bug #484708 : Added board serial numbers in hardware error state. +o Bug #484714 : Fixed sending of panic interrupts to the rest of the cpus from the cpu which is handling an nmi. +o Bug #485110 : OOOOvvvveeeerrrrllllaaaappppppppiiiinnnngggg mmmmeeeemmmmoooorrrryyyy ppppllllaaaacccceeeemmmmeeeennnntttt ffffoooorrrr mmmmuuuullllttttiiiipppplllleeee ppppaaaarrrraaaalllllllleeeellll jjjjoooobbbbssss.... Multiple parallel jobs often get placed on nodes which are already in use even when there are free nodes available. This bug can dramatically decrease perfomance for large throughput runs which include multiple parallel jobs. +o Bug #485318: BTE disabling information should be made more user friendly. Change message that gets printed when a BTE gets disabled to go to console buffer. Also indicate it will be restarted when system reboots. Make it a notice instead of warning. +o Bug #489412 : sssshhhhmmmmggggeeeetttt ffffaaaaiiiillllssss wwwwhhhheeeennnn ssssiiiizzzzeeee >>>> 2222GGGGBBBB uuuussssiiiinnnngggg 66664444bbbbiiiitttt AAAABBBBIIII Correct data types so that the kernel now honors the creation of large shared memory areas specified with 64bit sizes. +o Bug #490636: Extract the correct serial number from the nic information in case of multiple nic information entries being stored for a single node board. +o Bug #492365: curaspm() macro should return proper pointer. - 11 - Fix the way aspm pointer was returned to the caller. +o Bug #704587 : SSSSwwwwaaaapppp aaaannnndddd dddduuuummmmpppp ddddeeeevvvviiiicccceeeessss ccccoooouuuulllldddd nnnnooootttt bbbbeeee ssssppppeeeecccciiiiffffiiiieeeedddd ooooffffffff ooooffff tttthhhheeee rrrrooooooootttt ddddiiiisssskkkk.... Previously, the kernel attempted to open the swap and dump devices early in the boot sequence, when only the root device was in the hardware graph. With this patch, non-default swap and dump devices are set up after the hardware graph is fully initialized. Specify these devices as full pathnames, for example, /dev/dsk/dks0d2s1. NOTE: besides this patch, a separate patch to /sbin/ioconfig is required to use non-default swap and dump devices. +o Kernel fixes that enable patch 1992 to fix the ipcs command and address problems with SysV shm reporting. Note that the fixes in this patch don't actually fix the problems (reported in 458212, 473757, 474576). This patch satisfies the kernel prerequisites for patch #1992, which fixes those problems. +o Bug #500607: Origin low-level interrupt code fails to handle NULL dev_desc Origin systems now correctly accept a NULL dev_desc parameter in calls to *_intr_alloc. The result will be a threaded interrupt handler, the same as if the default dev_desc for the device had been passed in. Bugs fixed in Patch SG0002056: +o Bug #477391 : New ioctl PIOCGETINODE for /proc to get inode information about a debugged process' files Bugs fixed in Patch SG0001856: +o AAAAddddddddeeeedddd ssssuuuuppppppppoooorrrrtttt ffffoooorrrr nnnneeeewwww IIIIPPPP22229999 bbbbooooaaaarrrrdddd A change in the physical IP29 board required kernel support. Boards with part number 030-1244-001 are supported by this patch. +o Bug #473951 : OOOOnnnn OOOOCCCCTTTTAAAANNNNEEEE,,,, iiiimmmmpppprrrroooovvvveeee ppppeeeerrrrffffoooorrrrmmmmaaaannnncccceeee wwwwhhhheeeennnn cccchhhheeeecccckkkkiiiinnnngggg CCCCPPPPUUUU ssssttttaaaattttuuuussss.... Use cached variable to determine whether a cpu is enabled or not instead of doing 2 pio reads to heart; fix loop that calculating maxcpus. - 12 - +o Bug #472570 : race in early bootup affects small machines +o Bug #472381 : BBBBuuuugggg iiiinnnn ppppaaaaggggeeee ffffaaaauuuulllltttt hhhhaaaannnnddddlllleeeerrrr Kernel would panic in vfault when faulting in a demand zero fill page due to an invalid attribute structure reference. The attribute structure was becoming invalid due to a temporary release of the region lock while zeroing out a page, in order to increase parallelism. This bug had a high probability of occurrence when running highly multithreaded applications, specially when portions of the shared address space were being pinned. Another manifestation of this bug was an application hanging in an unkillable state. +o Bug #472362 : Read the corresponding int pend registers after clearing the interrupt to avoid a race where the bit gets cleared much later causing us to lose interrupts. +o Bug #472121 : There is a race in the hardware error saving code that cause the FRU to give a bogus analysis if we get a cache error while we are saving the error state and panicing. +o Bug #472041 : Added support to turn off bypassing in the router on IP27 +o Bug #471664 : Nsort program crashes while using shared memory +o Bug #471654 : MMMMeeeemmmmoooorrrryyyy eeeerrrrrrrroooorrrrssss ccccaaaannnn ggggoooo uuuunnnnrrrreeeeccccoooorrrrddddeeeedddd dddduuuueeee ttttoooo ssssppppeeeeccccuuuullllaaaattttiiiioooonnnn Multiple uncorrectable errors could cause the md error register to be set due tospeculation on the local node. However software has no indication of this since we don't see an interrupt or cache errors. When we do get the real error on another page, the error register still holds the first error and the multiple error bit gets set in the register. Since the error address does not match the address in the register, the page does not get discarded. This - 13 - allows the page to get reused and we finally panic but since the bad address is not logged anywhere, we cannot reportthe error correctly. +o Bug #471021 : CCCCoooorrrrrrrreeeeccccttttiiiioooonnnn ttttoooo FFFFeeeettttcccchhhh++++OOOOpppp ccccaaaacccchhhheeee fffflllluuuusssshhhhiiiinnnngggg Fetch+Op cache needs to be flushed when the page that's allocated for Fetch+Op operation is being freed. Reusing this page without flushing could lead to problems. +o Bug #470333 : IIIImmmmpppprrrroooovvvveeeemmmmeeeennnntttt ooooffff mmmmeeeemmmmoooorrrryyyy eeeerrrrrrrroooorrrr mmmmeeeessssssssaaaaggggeeeessss oooonnnn IIIIPPPP33330000.... +o FFFFiiiixxxx cccchhhheeeecccckkkkiiiinnnngggg ooooffff uuuunnnniiiiqqqquuuueeee iiiidddd ((((uuuuuuuuiiiidddd)))) One case of unique id (uuid) comparison in the kernel was incorrect; also the error codes returned for different flavors of invalid uuids were not in compliance with the DCE specification. +o Bug #467176 : OOOOnnnn IIIIPPPP22227777 ssssyyyysssstttteeeemmmmssss,,,, tttthhhheeee kkkkeeeerrrrnnnneeeellll mmmmaaaayyyy ppppaaaannnniiiicccc wwwwiiiitttthhhh CCCCrrrraaaayyyyLLLLiiiinnnnkkkk nnnneeeettttwwwwoooorrrrkkkk ttttiiiimmmmeeeeoooouuuutttt mmmmeeeessssssssaaaaggggeeeessss.... For IP27 systems, the aging of messages which facilitates message delivery without starvation was not setup right. This could cause the machine to panic since some messages timeout after being starved for a long time. This bug especially effects configurations with a large number of cpus. This bug has been fixed in this patch. +o Bug #465295 : IIIImmmmpppprrrrooooppppeeeerrrr ccccaaaallllccccuuuullllaaaattttiiiioooonnnn ooooffff ssssttttaaaarrrrttttiiiinnnngggg vvvviiiirrrrttttuuuuaaaallll aaaaddddddddrrrreeeessssssss Kernel fault when running a third-party data-mining application. +o Bug #466237 : FFFFiiiixxxx ttttoooo ssssyyyysssstttteeeemmmm ccccaaaallllllll bbbbuuuugggg tttthhhhaaaatttt mmmmaaaayyyy ccccaaaauuuusssseeee aaaa ssssyyyysssstttteeeemmmm ppppaaaannnniiiicccc iiiinnnn ssssyyyyssssssssggggiiii((((2222)))) uuuussssiiiinnnngggg SSSSGGGGIIII____RRRRTTTT____TTTTSSSSTTTTAAAAMMMMPPPP____UUUUPPPPDDDDAAAATTTTEEEE ++++oooo BBBBuuuugggg ####444466665555000022225555 :::: FFFFiiiixxxx ttttoooo ssssyyyysssstttteeeemmmm ccccaaaallllllll bbbbuuuugggg tttthhhhaaaatttt mmmmaaaayyyy ccccaaaauuuusssseeee aaaa ssssyyyysssstttteeeemmmm ppppaaaannnniiiicccc iiiinnnn sssseeeettttccccoooonnnntttteeeexxxxtttt((((2222)))).... +o Bug #465061 : FFFFiiiixxxx ttttoooo ssssyyyysssstttteeeemmmm ccccaaaallllllll bbbbuuuugggg tttthhhhaaaatttt mmmmaaaayyyy ccccaaaauuuusssseeee aaaa ssssyyyysssstttteeeemmmm ppppaaaannnniiiicccc iiiinnnn ssssyyyyssssssssggggiiii((((2222)))) uuuussssiiiinnnngggg SSSSGGGGIIII____SSSSPPPPRRRROOOOFFFFIIIILLLL aaaassss tttthhhheeee rrrreeeeqqqquuuueeeesssstttt.... +o Bug #464708 : FFFFiiiixxxx ssssoooo ddddiiiisssskkkklllleeeessssssss cccclllliiiieeeennnnttttssss ccccaaaannnn bbbbooooooootttt +o Bug #464517 : BBBBuuuugggg iiiinnnn kkkkeeeerrrrnnnneeeellll''''ssss eeeemmmmuuuullllaaaatttteeee____bbbbrrrraaaannnncccchhhh ccccooooddddeeee.... - 14 - This scenario can happen whenever there is a floating point instruction in the shadow of one of these branches. Found because the exponential function in libfastm was sometimes failing. Bugs fixed in Patch SG0001954: +o Bug #470142 : ppppaaaannnniiiicccc dddduuuueeee ttttoooo nnnnuuuullllllll pppp---->>>>pppp____sssshhhhaaaaddddddddrrrr iiiinnnn iiiirrrriiiixxxx5555____pppprrrrggggeeeettttppppssssiiiinnnnffffoooo(((()))) Bugs fixed in Patch SG0001815: Bug #463622 : +o DDDDeeeevvvviiiicccceeee ddddrrrriiiivvvveeeerrrrssss ttttrrrryyyyiiiinnnngggg ttttoooo mmmmaaaapppp kkkkeeeerrrrnnnneeeellll mmmmeeeemmmmoooorrrryyyy ttttoooo uuuusssseeeerrrr aaaaddddddddrrrreeeessssssss ssssppppaaaacccceeee ccccoooouuuulllldddd ppppaaaannnniiiicccc tttthhhheeee ssssyyyysssstttteeeemmmm Kernel would panic in spec_unmap() routine when a user level process tries to invoke a mmap(2) system call to their device driver. Problem was, driver was asking kernel to allocate memory. In response kernel would return an address in kernel virtual memory space (XKSEG). Driver would then try to map this address to user address space. The interface to do this mapping, was incorrectly checking this kernel virtual address range, and would end up returning an error for the mapping. In the error return path for the mmap(2) system call, this would cause some problem, and we would end up causing the above panic. This bug would be triggered only if device drivers try to allocate kernel memory greater than a single page size (16Kbytes). +o Bug #460221 : SSSSyyyysssstttteeeemmmmssss wwwwiiiitttthhhh jjjjuuuusssstttt oooonnnneeee rrrruuuunnnnnnnniiiinnnngggg pppprrrroooocccceeeessssssssoooorrrr wwwwoooouuuulllldddd ccccaaaauuuusssseeee hhhhaaaannnnggggssss.... This bug would get triggered only on systems with one processor. In these systems, the utlbmiss code path for single cpu Origin 2000 and Origin 200 was broken since the functions that selected and removed the (switchable) utblmiss handlers were not consistent. That is, for single cpu origins, the utlbmiss_resume always patches the utlbmiss code in one way, whereas utlbmiss_reset does not undo the patch correctly. Always using the mp case for origins fixes the problem. +o Bug #463665 : IIIIssssoooollllaaaattttiiiinnnngggg pppprrrroooocccceeeessssssssoooorrrrssss oooonnnn OOOOrrrriiiiggggiiiinnnn ssssyyyysssstttteeeemmmmssss ccccaaaauuuusssseeeessss kkkkeeeerrrrnnnneeeellll ppppaaaannnniiiicccc.... - 15 - When a processor is isolated, code in locore attempts to update the p_kvfault array which contains a bit for each kernel virtual address and indicates that the processor faulted on that XKSEG address. This is needed since isolated processors do NOT have their tlbs synced with the other processors unless they have faulted on the addresses being freed. When the kernel became mapped the first 32 MB of XKSEG space was removed from the sptmap but was left in the kptbl. This requires us to bias the address used in accessing the p_kvfault array. +o Bug #484706 : SSSSyyyysssstttteeeemmmm IIIInnnntttteeeerrrrffffaaaacccceeee EEEErrrrrrrroooorrrr RRRReeeeppppoooorrrrtttteeeedddd aaaassss CCCCaaaacccchhhheeee EEEErrrrrrrroooorrrr.... System interface errors in the R10000 are reported through the cache error register, and were thus reported by the kernel as cache errors (which is misleading). System interface error messages are now printed in these cases. +o Bug #483230 : Made an optimization in the code dealing with shaddr sproc processes which are being debugged and had locked instruction pages. Each sproc being debugged would get its own copy of all the locked instruction pages, leading to bloat. This page copying has been minimized so that only the page which is being modified by the debugger (eg for setting breakpoints) will be made private to the target sproc. +o Bug #496469 : hhhhuuuubbbbddddeeeevvvv____ccccaaaalllllllloooouuuuttttssss ccccaaaannnn hhhhaaaannnngggg aaaa mmmmaaaacccchhhhiiiinnnneeee.... hubdev_callouts holds a spinlock and calls functions which could in some rare cases go to sleep. This was observed mainly on systems where both cpus on a node board had been disabled. The fix has been to change the spinlock to a mutex. +o Bug #484928 : NNNN33332222 pppprrrrooooggggrrrraaaammmm ccccaaaauuuusssseeeessss mmmmaaaacccchhhhiiiinnnneeee nnnnooootttt ttttoooo mmmmaaaakkkkeeee pppprrrrooooggggrrrreeeessssssss.... There are cases where a cpu appears not to make progress while running a program. There now exists a periodic check from each cpu whether forward progress is being made by all the other cpus. Appropriate action is taken if any cpu does not seem to be making progress. +o Bug #504612 : ////pppprrrroooocccc ppppssssiiiinnnnffffoooo ggggiiiivvvveeeessss wwwwrrrroooonnnngggg sssscccchhhheeeedddd ccccllllaaaassssssss.... - 16 - Correct reporting of scheduling classes caused by incorrect check of PR_SPID. +o Bug #506980 : NNNN oooonnnn NNNN ppppeeeerrrrffffoooorrrrmmmmaaaannnncccceeee ddddeeeeggggrrrraaaaddddaaaattttiiiioooonnnn.... In these cases parallel jobs can get placed such that memories get reused causing extreme slowdown when used in conjunction with mustrun. 1.4 _S_u_b_s_y_s_t_e_m_s__I_n_c_l_u_d_e_d__i_n__P_a_t_c_h__S_G_0_0_0_2_2_1_1 This patch release includes these subsystems: +o patchSG0002211.dev_man.irix_lib +o patchSG0002211.eoe_hdr.lib +o patchSG0002211.eoe_man.unix +o patchSG0002211.eoe_sw.kdebug +o patchSG0002211.eoe_sw.unix 1.5 _I_n_s_t_a_l_l_a_t_i_o_n__I_n_s_t_r_u_c_t_i_o_n_s Because you want to install only the patches for problems you have encountered, patch software is not installed by default. After reading the descriptions of the bugs fixed in this patch (see Section 1.3), determine the patches that meet your specific needs. If, after reading Sections 1.1 and 1.2 of these release notes, you are unsure whether your hardware and software meet the requirements for installing a particular patch, run _i_n_s_t. The _i_n_s_t program does not allow you to install patches that are incompatible with your hardware or software. Patch software is installed like any other Silicon Graphics software product. Follow the instructions in your _S_o_f_t_w_a_r_e _I_n_s_t_a_l_l_a_t_i_o_n _A_d_m_i_n_i_s_t_r_a_t_o_r'_s _G_u_i_d_e to bring up the miniroot form of the software installation tools. Follow these steps to select a patch for installation: 1. At the Inst> prompt, type iiiinnnnssssttttaaaallllllll ppppaaaattttcccchhhhSSSSGGGG_x_x_x_x_x_x_x - 17 - where _x_x_x_x_x_x_x is the patch number. 2. Initiate the installation sequence. Type IIIInnnnsssstttt>>>> ggggoooo 3. You may find that two patches have been marked as incompatible. (The installation tools reject an installation request if an incompatibility is detected.) If this occurs, you must deselect one of the patches. IIIInnnnsssstttt>>>> kkkkeeeeeeeepppp ppppaaaattttcccchhhhSSSSGGGG_x_x_x_x_x_x_x where _x_x_x_x_x_x_x is the patch number. 4. After completing the installation process, exit the _i_n_s_t program by typing IIIInnnnsssstttt>>>> qqqquuuuiiiitttt 1.6 _P_a_t_c_h__R_e_m_o_v_a_l__I_n_s_t_r_u_c_t_i_o_n_s To remove a patch, use the _v_e_r_s_i_o_n_s _r_e_m_o_v_e command as you would for any other software subsystem. The removal process reinstates the original version of software unless you have specifically removed the patch history from your system. vvvveeeerrrrssssiiiioooonnnnssss rrrreeeemmmmoooovvvveeee ppppaaaattttcccchhhhSSSSGGGG_x_x_x_x_x_x_x where _x_x_x_x_x_x_x is the patch number. To keep a patch but increase your disk space, use the _v_e_r_s_i_o_n_s _r_e_m_o_v_e_h_i_s_t command to remove the patch history. vvvveeeerrrrssssiiiioooonnnnssss rrrreeeemmmmoooovvvveeeehhhhiiiisssstttt ppppaaaattttcccchhhhSSSSGGGG_x_x_x_x_x_x_x where _x_x_x_x_x_x_x is the patch number. 1.7 _K_n_o_w_n__P_r_o_b_l_e_m_s